Methods for single-molecule analysis

ABSTRACT

Methods for single-molecule preparation and analysis are disclosed herein. The methods can, for example, be used for isolating and analyzing DNA from various biological samples.

BACKGROUND

1. Technical Field

The present invention relates to the field of nanotechnology and to the field of single molecule genomic analysis.

2. Description of the Related Art

Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data. However, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable.

Accurate de novo assembly of sequence reads represents the weak link in genome projects despite advances in high-throughput sequencing [1,2]. There are two general steps in genome sequence assembly, generation of sequence contigs and scaffolds, and their anchoring on genome-wide, lower resolution maps. NGS platforms generate sequence reads ranging from 25 to more than 500 bases [3], while reads of up to 1000 bases can be obtained by Sanger sequencing with high accuracy. NGS reads are often too short for unambiguous assembly. Paired-end reads can bridge contigs into scaffolds, but there are often gaps within the scaffolds. To order contigs and scaffolds, high-resolution genomic maps from an independent technology platform are needed. They may be of chromosomal scale, i.e., genetic maps, or regional scale, i.e., contigs of bacterial artificial chromosomes (BACs) or fosmids [4]. Contigs and scaffolds may be difficult to map if they are too short compared to the map resolution. For example, maps may have a resolution of 50-150 kb, while many contigs and scaffolds may only span a few kilobases. Additionally, there are errors in the contigs and scaffolds themselves, often due to misassembly of repeat sequences. Typical medium to large genomes contain 40-85% repetitive sequences [5-8], dramatically hindering effective de novo sequence assembly.

Genome finishing has relied on guidance of a physical map for large and complex genomes, including human, arabidopsis [9], rice [10] and maize [11,12]. BAC-based restriction fragment physical mapping of complex genomes is fairly robust because even in the presence of interspersed repeat sequences along the BAC inserts (typically 100-220 kb long), a unique pattern of restriction fragments is generated. State of the art technologies for physical map construction include SNaPshot [13,14], whole-genome profiling [15,16], optical mapping [17,18], and genome mapping [19]. SNaPshot is a restriction fingerprinting method which uses one or more restriction enzymes and fluorescent labels followed by separation of fragments by capillary electrophoresis. SNaPshot has been used for physical mapping of wheat and other genomes [14,20]. Optical mapping provides an additional layer of information by retaining the physical order of restriction sites along DNA molecules immobilized on a surface [18]. It has been applied to the maize and the rice genome [11,21]. One can validate a sequence assembly by comparing in silico sequence motif maps to consensus optical maps [22-25]. However, information density for optical maps is only about one site per 20 kb, and the technology is limited in utility by high error-rates, non-uniform DNA linearization, and low throughput. Therefore, a high-resolution (e.g., <5 kb) DNA sequencing-independent mapping method that can overcome these constraints of optical mapping is much needed.

SUMMARY OF THE INVENTION

Methods for preparing samples and performing single molecule analysis, including methods of mitigating the effects of fragile sites and improving information density for genome mapping, are provided herein.

In an embodiment, a method of characterizing a DNA is provided, comprising: nicking a first DNA at a first sequence motif, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the nicks on the first DNA with a first label; repairing the nicks on the first DNA; marking the repaired first DNA with a second label, wherein the second label is non-sequence-specific, and wherein the second label is different from the first label; linearizing the first DNA following labeling with the first and second labels; and detecting the pattern of the first label on the linearized first DNA.

In an embodiment, a method of characterizing DNA is provided, comprising: nicking a first DNA at a first sequence motif, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the nicks on the first DNA with a first label; repairing the nicks on the first DNA following labeling with the first label; nicking the repaired first DNA at a second sequence motif, wherein the repaired first DNA remains double-stranded adjacent to the nicks; labeling the nicks at the second sequence motif on the first DNA with a second label; repairing the nicks on the first DNA following labeling with the second label; marking the first DNA with a third label, wherein the third label is non-sequence-specific, and wherein the third label is different from the first and second labels; linearizing the first DNA following labeling with the third label; detecting the pattern of at least one of the first and second labels on the first linearized DNA.

In an embodiment, a method of characterizing DNA is provided, comprising: nicking one strand of a first DNA at a recognition sequence with a first nicking endonuclease, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the first DNA at the nicking sites with a first label; repairing the nicks on the first DNA; nicking the complementary strand of a second DNA at the recognition sequence with a second nicking endonuclease, wherein the second DNA is double stranded, and wherein the second DNA remains double-stranded adjacent to the nicks; labeling the second DNA at the nicking sites with a second label; and repairing the nicks on the second DNA.

In some embodiments, the methods described herein further comprise: nicking one strand of a second DNA at a recognition sequence with the first nicking endonuclease, wherein the second DNA is double stranded, and wherein the second DNA remains double-stranded adjacent to the nicks; labeling the second DNA at the nicking sites repairing the nicks on the second DNA; nicking the complementary strand of the second DNA at the recognition sequence with the second nicking endonuclease; labeling the second DNA at the nicking sites; repairing the nicks on the second DNA; and marking the repaired first and second DNAs with a third label, wherein the third label is a non-sequence-specific label.

In an embodiment, a method of characterizing DNA is provided, comprising: nicking a first DNA at a first sequence motif, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the nicks on the first DNA with a first label; repairing the nicks on the first DNA; tagging the first DNA at a second sequence motif with a second label, wherein the second label does not cut DNA; marking the first DNA with a third label, wherein the third label is a non-sequence-specific label, and wherein the third label is different from the first and second labels; linearizing the first DNA following labeling with the first, second, and third labels; and detecting the first and second labels on the linearized first DNA.

In an embodiment, a method of characterizing DNA is provided, comprising: treating a double-stranded DNA comprising at least one flap on either strand of the DNA with a 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one species of dNTP is in present in limited concentration compared to other dNTPs that are present; ligating the nicks to restore strand integrity at flap regions; and characterizing the DNA.

In an embodiment, a method of characterizing DNA is provided, comprising: treating a double-stranded DNA comprising at least one flap on either stand of the DNA with a 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one species of dNTP is omitted; ligating the nicks to restore strand integrity at the flap regions; and characterizing the DNA.

In some embodiments, the methods described herein further comprise: nicking a second DNA at the first sequence motif; labeling the nicks on the second DNA with the first label; repairing the nicks on the second DNA; marking the repaired second DNA with the second label; linearizing the second DNA following labeling with the first and second labels; and detecting the pattern of the first or second label on the linearized second DNA.

In some embodiments, the methods described herein further comprise: nicking a second DNA at the first sequence motif, wherein the second DNA is double stranded, and wherein the second DNA remains double-stranded adjacent to the nicks; labeling the nicks on the second DNA with the first label; repairing the nicks on the second DNA following labeling with the first label; nicking the repaired second DNA at the second sequence motif, wherein the repaired second DNA remains double-stranded adjacent to the nicks; labeling the nicks at the second sequence motif on the second DNA with the second label; repairing the nicks on the second DNA following labeling with the second label; marking the second DNA with the third label; linearizing the second DNA following labeling with the third label; and detecting the pattern of at least one of the first and second labels on the second linearized DNA.

In some embodiments, the methods described herein further comprise comparing the pattern of the first label on the first DNA to the pattern of the first label on the second DNA. In some embodiments, the methods described herein further comprise: assembling a plurality of first DNAs using overlap of the labeled sequence motifs to construct a first DNA map; assembling a plurality of second DNAs using overlap of the labeled sequence motifs to construct a second DNA map; and comparing the first DNA map to the second DNA map.

In some embodiments, the methods described herein further comprise: marking the repaired first and second DNAs with a third label, wherein the third label is a non-sequence-specific label. In some embodiments, the methods described herein further comprise: linearizing the first and second DNAs; detecting the first and second labels on the linearized DNA; and assembling the labeled DNA molecules using overlap of the labeled sequence motifs to construct a DNA map. In some embodiments, the first and second labels are the same label.

In some embodiments, the methods described herein further comprise: nicking a second DNA at the first sequence motif, wherein the second DNA is double stranded, and wherein the second DNA remains double-stranded adjacent to the nicks; labeling the nicks on the second DNA with the first label; repairing the nicks on the second DNA; tagging the second DNA at the second motif with the second label; marking the second DNA with the third label; linearizing the second DNA following labeling with the first and second labels; and detecting the first and second labels on the linearized second DNA.

In some embodiments, the linearizing includes transporting the DNA into a nanochannel. In some embodiments, the methods described herein further comprise comparing the pattern of at least one of the first or second labels on the first DNA to a pattern of labels on a reference DNA. In some embodiments, the methods described herein further comprise comparing the pattern of the first label on the first DNA to a pattern of labels on a reference DNA. In some embodiments, the methods described herein further comprise comparing the pattern of the second label on the first DNA to a pattern of labels on a reference DNA, wherein the second label is a sequence specific label. In some embodiments, the methods described herein further comprise assembling the labeled first DNA using the pattern of labeled motifs to construct a first DNA map. In some embodiments, the methods described herein further comprise assembling the labeled second DNA using the pattern of labeled motifs to construct a first DNA map. In some embodiments, the second label is a non-sequence-specific label. In some embodiments, the second sequence motif includes at least one binding site for a DNA binding entity selected form the group consisting of a non-cutting restriction enzyme, a zinc finger protein, an antibody, a transcription factor, a transcription activator like domain, a DNA binding protein, a polyamide, a triple helix forming oligonucleotide, and a peptide nucleic acid, wherein the tagging is effected with the binding entity comprising the second label, and wherein the second label is selected form the group consisting of a fluorophore, a quantum dot, a dendrimer, a nanowire, a bead, a hapten, streptavidin, avidin, neutravidin, biotin, and a stabilized reactive group. In some embodiments, the second sequence motif includes at least one binding site for a peptide nucleic acid, wherein the tagging is performed with the peptide nucleic acid comprising the second label, and wherein the second label is a fluorophore or a quantum dot. In some embodiments, the second sequence motif includes at least one binding site for a methyltransferase, and wherein tagging is performed with the methyltransferase comprising a modified cofactor which includes the second label. In some embodiments, the first and second labels are independently selected from the group consisting of a fluorophore, a quantum dot, a dendrimer, a nanowire, a bead, a hapten, a streptavidin, an avidin, a neutravidin, a biotin, and a reactive group. In some embodiments, the first and second labels are independently selected from the group consisting of a fluorophore or a quantum dot. In some embodiments, the labeling is carried out with a polymerase. In some embodiments, the labeling is carried out with a polymerase in the presence of dNTPs comprising the label. In some embodiments, the polymerase has a 5′ to 3′ exonuclease activity. In some embodiments, the polymerase leaves a flap region, and wherein the flap region is removed to restore a ligatable nick prior to the repairing with a ligase. In some embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one nucleotide is present in limited concentration. In some embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one nucleotide is omitted from the reaction. In some embodiments, the flap region is removed with a flap endonuclease. In some embodiments, the labeling is carried out with a polymerase in the presence of at least one species of dNTP. In some embodiments, the at least one species of dNTP is a single species of dNTP. In some embodiments, activity of the polymerase is modulated by adjusting the temperature, dNTP concentration, cofactor concentration, buffer concentration, or any combination thereof, during labeling.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows fragmentations that can occur at fragile sites as a result of nicking, where nicks are closer to one another (FIG. 1A) or farther apart (FIG. 1B).

FIG. 2 shows DNA length corresponding to the midpoint in a size histogram showing molecules arranged from smallest to largest in length (or mass).(shown as “center of mass”) the percent of DNA molecules that are mapped against a reference genome (shown as “mapping to reference genome”), and the false positive and false negative rates for mapping to a sequenced reference genome compared to a simulation for the same (shown as “false positive” and “false negative”) rates in E. coli subjected to the following treatments: 1.) no repair, 2.) repair with PreCR as recommended by manufacture (New England BioLabs), 3.) repair with PreCR under conditions of omitting dGTP, 4.) repair with PreCR under conditions of omitting dATP and dGTP, and 5.) repair with Taq polymerase under conditions of omitting dGTP.

FIG. 3 shows center of mass, percent mapping to a reference genome, and false positive and false negative rates in E. coli subjected to the following treatments: 1.) no repair, or 2) treatment with FEN I to remove flaps followed by a ligase to repair the translated nicks.

FIG. 4 shows center of mass, percent mapping to a reference genome, and false positive and false negative rates in Drosophila subjected to the following treatments: 1.) nicking with Nt.BspQI and PreCR repair, and 2.) nicking with Nb.BbVCI and PreCR repair.

FIG. 5 shows two-color genome mapping with two enzymes, including the layout of an IrysChip (5A), linearization in nanochannels (5B), distribution of labels at sequence-specific locations (5C), and the alignment of consensus maps (5D) as described in Example 4.

DETAILED DESCRIPTION

Maintaining and restoring the integrity of DNA strands is essential for obtaining long labeled molecules that are useful for complex genome mapping and information density. The methods described herein provide approaches to minimize the formation of fragile DNA sites and fragmentation of DNA, restore the structural integrity of DNA following the use of nicking approaches, and maximize the information content of DNA in order to generate high-resolution maps.

Described herein are approaches that can be used in conjunction with a nanochannel array to reproducibly and uniformly linearize DNA. In addition to improved noise characteristics (e.g., by virtue of keeping DNA in solution rather than affixed), these approaches can entail cycles of channel-loading and imaging to generate high-throughput DNA reads. Genome mapping on nanochannel arrays at the single-molecule level overcomes many of the limitations of preexisting technologies and is described in depth in Lam ET et al. (Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly, Nat Biotechnol 30: 771-776, 2012), which is hereby incorporated by reference in its entirety. In some embodiments described herein, a genome mapping approach allows multiple motifs to be labeled with different colors is employed, significantly increasing information density.

In some embodiments, a high-resolution physical map is constructed. The physical map can be used to validate or correct a physical map generated using another method, such as SNaPshot fingerprinting technology. In some embodiments, the physical map is used to validate assembled regions and correct inaccuracies in sequence scaffolds. The physical map can also be used to facilitate de novo sequence assembly of a region by anchoring sequence scaffolds. In some embodiments, the physical map is used to produce a highly accurate and complete sequence assembly.

In some embodiments provided herein, nick labeling is used to prepare DNA for analysis. As part of the nick labeling process, nicks can move closer to one another (as shown in FIG. 1A) or farther apart (as shown in FIG. 1B). It has been discovered that fragile sites occur when two nicks are <1 Kb apart on opposite DNA strands. Fragmentation can occur at fragile sites due, for example, to: 1) mechanical manipulation, 2) heat required for labeling, 3) strand extension associated with labeling and certain kinds of repair (e.g., using the exonuclease activity of polymerases), or 4) shear forces associated with linearizing DNA molecules. In general, the shorter the distance between nicks, the more frequent the fragmentation, particularly if labeling decreases the original distance (FIG. 1A). As described herein, it has been found that repairing nicks can ameliorate the breakage of DNA.

In some embodiments, the methods described herein utilize nicking enzymes to create sequence-specific nicks that are subsequently labeled, for example by a fluorescent nucleotide analog. In some embodiments, the nick-labeled DNA is stained with the intercalating dye, loaded onto a nanofluidic chip by an electric field, and imaged. In some embodiments, the DNA is linearized by confinement in a nanochannel array, resulting in uniform linearization and allowing precise and accurate measurement of the distance between nick-labels on DNA molecules comprising a signature pattern. In some embodiments, DNA loading and imaging can be repeated in an automated fashion. In some embodiments, a second nicking enzyme is used. In some embodiments, this second nicking enzyme is used with a second label color.

In some embodiments, methods are provided to mitigate fragile site-based fragmentation. In some embodiments, reduced driving conditions are used to limit the rate of incorporation of a label, and therefore minimize fragmentation at the fragile sites. In some embodiments, reduced driving conditions are used to minimize shearing stress forces associated with DNA elongation. In some embodiments, drive is reduced by lowering the concentration of dNTPs, lowering reaction temperature, lowering cofactor concentration, adjusting buffer and salt concentration, or a combination thereof. Drive can be also be reduced at the level of repair by stimulating the exonuclease activity of a polymerase with a high concentration of dNTPs, then limiting extension by restricting or omitting at least one nucleotide (which can be referred to as “choked repair”). In a preferred embodiment, a single species of dNTP (e.g., dATP) is incorporated at the nick site, the flap is removed with a flap nuclease without extension, and ligation is performed.

In some embodiments, a suboptimal temperature for a thermophilic polymerase is used to reduce driving conditions. In some embodiments, the reaction temperature is about 35° C. to about 75° C., such as 35 ° C., 36° C., 37° C., 38° C., 39° C., 40° C., 41° C., 42° C., 43° C., 44° C., 45° C., 46° C., 47° C., 48° C., 49° C., 50° C., 51° C., 52° C., 53° C., 54° C., 55° C., 56° C., 57° C., 58° C., 59° C., 60° C., 61° C., 62° C., 63° C., 64° C., 65° C., 66° C., 67° C., 68° C., 69° C., 70° C., 71° C., 72° C., 73° C., 74° C., or 75° C. In preferred embodiments, the temperature is between about 50° C. and about 55° C., between about 55° C. and about 60° C., between about 60° C. and about 65° C., or between about 50° C. and about 65° C.

In some embodiments, the polymerase used herein is thermostable. In some embodiments, the polymerase is mesophilic. In some preferred embodiments, the polymerase does not have a proofreading capability. In some preferred embodiments, the polymerase has a strand displacement capability. In some preferred embodiments, the polymerase has a 5′ to 3′ exonuclease activity. In some preferred embodiments, the polymerase does not have proofreading ability, but does have a strand-displacement capability and a 5′ to 3′ exonuclease activity.

In some embodiments, nickases that target the same sequence motif but nick at opposite strands are used to target specific DNA strands to minimize the formation of fragile sites. In some embodiments, nickases have been modified to only bind to one strand of a double-stranded DNA. In some embodiments, nickases are used to target a single strand from a first DNA molecule, and a single strand from a second DNA molecule. In some of these embodiments, a single strand from the first DNA is targeted by a first nickase, and the complementary strand from the second DNA molecule is targeted with a second nickase that recognizes the same sequence motif as the first nickase. In some embodiments, the orientation of extension is reversed for one of the strands. For example, in some embodiments, extension from the site of nicking occurs in one direction for a first DNA molecule, and in the opposite direction for a second DNA molecule. In some embodiments, extension from the site of nicking occurs in one direction for a top strand of a DNA molecule, and in the opposite direction for the bottom strand for the same DNA molecule.

In some embodiments, a reference map is used for assembly as described herein.

In some embodiments, a plurality of nickases are used to maximize information density. In some embodiments, molecules nicked by the plurality of nickases are assembled using a reference map.

In some embodiments, more than one nicking step is used to maximize information density. In some embodiments, the molecule or molecules subjected to more than one nicking step are assembled using a reference map.

In some embodiments, DNA is linearized. Means of linearizing DNA can include the use of shear force of liquid flow, capillary flow, convective flow, an electrical field, a dielectrical field, a thermal gradient, a magnetic field, combinations thereof (e.g., the use of physical confinement and an electrical field), or any other method known to one of skill in the art. In some embodiments, the channel(s) described herein have a cross sectional dimension in the micrometer range. In some preferred embodiments, channels have a cross sectional dimension in the nanometer range. Examples of nanochannels and methods incorporating the use of nanochannels are provided in U.S. Publication Nos. 2011/0171634 and 2012/0237936, which are hereby incorporated by reference in their entireties.

In some embodiments, a second motif is investigated in a molecule of interest. In some embodiments, the second motif includes at least one binding site for a binding entity selected from a non-cutting restriction enzyme, a zinc finger protein, an antibody, a transcription factor, a transcription activator like domain, a DNA binding protein, a polyamide, a triple helix forming oligonucleotide, and a peptide nucleic acid. In some embodiments, marking or tagging of the second motif is effected with a binding entity comprising a second label. In some embodiments, marking is performed with a label that does not cut or nick the DNA. In some embodiments, tagging is performed with a label that does not cut or nick the DNA.

In some preferred embodiments, the second motif includes at least one binding site for a peptide nucleic acid. In some embodiments, tagging is effected with a peptide nucleic acid comprising a second label. In other embodiments, the second motif includes at least one recognition sequence for a methyltransferase. In some embodiments, tagging is performed with a methyltransferase. In some embodiments, tagging is performed with a methyltransferase comprising a modified cofactor which includes a second label.

In some embodiments, a modified cofactor is used. In some embodiments, the modified cofactor contains a second label that functions as a transferable tag which becomes covalently coupled to a methyltransferase recognition sequence. In other embodiments, the modified cofactor contains a second label that is directly coupled to a methyltransferase recognition sequence.

In some embodiments, the labels described herein are selected from a fluorophore, a quantum dot, a dendrimer, a nanowire, a bead, a hapten, a streptavidin, an avidin, a neutravidin, a biotin, or a reactive group. In some preferred embodiments, the first and second labels described herein are selected from a fluorophore or a quantum dot.

In some embodiments, labeling is carried out with a polymerase in the presence of at least one labeled dNTP using the process of nick translation. The labeled dNTP preferably contains a fluorophore or a quantum dot. In some embodiments, labeling is carried out as described in U.S. Provisional Application No. 61/713,862, which is hereby incorporated by reference in its entirety.

In some embodiments, the polymerase used herein leaves a flap region that is removed to generate a ligatable nick prior to repair. In some preferred embodiments, repair is carried out with a DNA ligase. Examples of DNA ligases include Taq DNA ligase, E. coli DNA ligase, T7 DNA ligase, T4 DNA ligase, and 9° N DNA ligase (New England Biolabs). In some embodiments, the flap region is removed with an endonuclease. For example, in some preferred embodiments, the flap region is removed with a flap endonuclease (e.g., FEN I). In some embodiments, the flap region is removed with an exonuclease. In some preferred embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a polymerase. In some preferred embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a polymerase under conditions where at least one of four nucleotides (e.g., dATP, dGTP, dCTP, dTTP/dUTP) is provided in limited concentration. In some preferred embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a polymerase under conditions where at least one of the four nucleotides is omitted. In some preferred embodiments, the flap region is removed using the 5′ to 3′ exonuclease activity of a Taq polymerase. In some embodiments, the flap is removed to restore ligatability of the translated nick. In some embodiments, the flap region is removed and the nick is repaired using a mixture of enzymes that perform these functions, such as PreCR enzyme mix (New England BioLabs). In some embodiments, the PreCR enzyme mix is used under conditions where at least one of the four nucleotides is provided in limited concentration or omitted.

Nucleotides that are not omitted during the flap removal process can be present at a concentration of about 25 nM to about 50 nM each, about 50 nM to about 100 nM, about 100 nM to about 200 nM, about 200 nM to about 400 nM, about 400 nM to about 800 nM, about 800 nM to about 1.6 uM, about 1.6 uM to about 3.2 uM, about 3.2 uM to about 6.4uM, about 6.4 uM to about 12.8 uM, about 12.8 uM to about 25.6 uM, about 25.6 uM to about 51.2 uM, about 51.2 uM to about 102.4 uM, about 102.4uM to about 204.8 uM, about 204.8 uM to about 409.6 uM, and about 409.6 uM to about 819.2 uM, about 819.2 uM to about 1638.4 uM, or about 1638.4 uM to about 3276.8 uM. In some preferred embodiments, the concentration of nucleotides that are not omitted is about 50 uM to about 500 uM each. In some preferred embodiments, the nucleotides that are present are present in equimolar amounts.

In some embodiments, the at least one nucleotide that is limited in concentration is at a concentration at least 2× less, at least 5× less, at least 10× less, at least 20×, at least 30× less, at least 60× less, at least 100×, at least 500× less, at least 1000× less, or at least 3000× less than at least one of the other nucleotides that is present. In some embodiments, the at least one nucleotide that is limited in concentration is at a concentration that is negligible compared to the nucleotides that are present. In some preferred embodiments, the at least one nucleotides that is limited in concentration is at a concentration at least 100× less that the nucleotides that are present.

In some embodiments, a method for repairing flap-containing DNA is provided. In some embodiments, at least one nucleotide is omitted prior to DNA characterization. For example, in some embodiments, the method entails treating a double stranded DNA containing at least one flap on either stand of the DNA with a 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one nucleotide is omitted, ligating the nicks to restore strand integrity at the flap regions, and characterizing the DNA. In some embodiments, at least one nucleotide is limited in concentration prior to DNA characterization. For example, in some embodiments, the method entails treating a double stranded DNA comprising at least one flap on either stand of the DNA with a 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one nucleotide is limited in concentration, ligating the nicks to restore strand integrity at the flap regions, and characterizing the DNA.

Methods for characterizing the molecules described herein include any method for determining the information content of the DNA, such as sequencing, mapping, single nucleotide polymorphism (SNP) analysis, copy number variant (CNV) analysis, haplotyping, or epigenetic analysis.

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art.

The DNA described herein can be of any length (e.g., 0.1 Kb to a mega base). The DNA can be a highly pure preparation, crude, or semi-crude material. The DNA can come from any biological source or can be synthetic.

As used herein, the term “polymerase” refers to any enzyme, naturally occurring or engineered, that is capable of incorporating native and modified nucleotides in a template dependent manner starting at a 3′ hydroxyl end.

As used herein, the term “nicking endonuclease” refers to any enzyme, naturally occurring or engineered, that is capable of breaking a phosphodiester bond on a single DNA strand, leaving a 3′-hydroxyl at a defined sequence. Nicking endonucleases can be engineered by modifying restriction enzymes to eliminate cutting activity for one DNA strand, or produced by fusing a nicking subunit to a DNA binding domain, for example, zinc fingers and DNA recognition domains from transcription activator-like effectors.

EXAMPLES

The following examples are intended to illustrate, but not to limit, the invention in any manner, shape, or form, either explicitly or implicitly. While they are typical of those that might be used, other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.

Example 1

E. coli genomic DNA was nicked with Nt.BspQI nicking endonuclease. The nicked DNA was labeled with Taq polymerase by nick translation using Atto dUTP or Alexa dUTP in the presence of cold dATP, dGTP, and dCTP. The labeled nicks were: 1.) not repaired, 2.) repaired with PreCR as recommended by manufacture (New England BioLabs), 3.) repaired with PreCR under conditions of omitting dGTP, 4.) repaired with PreCR under conditions of omitting dATP and dGTP, or 5.) repaired with Taq polymerase under conditions of omitting dGTP. Ligation was then performed with a ligase. The resulting DNA was stained with YOYO-1 (Life Technologies) and processed on the Irys system (BioNano Genomics). Briefly, DNA was linearized in massively parallel nanochannels, excited with the appropriate laser for backbone and label detection, and optically imaged. Mapping to a reference genome, center of mass, and False Positive (FP) and False Negative (FN) calculations were carried out using nano Studio data analysis software (BioNano Genomics). Results are shown in FIG. 2.

Example 2

E. coli genomic DNA was nicked with Nt.BspQI nicking endonuclease. The nicked DNA was labeled with Taq polymerase by nick translation using Atto dUTP. The labeled DNA was: 1.) left unrepaired or 2.) treated with FEN Ito remove flaps followed by a ligase to repair the translated nicks. The DNA was linearized in massively parallel nanochannels, excited with the appropriate laser for backbone and label detection, and optically imaged. Mapping to a reference genome, center of mass, and False Positive (FP) and False Negative (FN) calculations were carried out using nano Studio data analysis software (BioNano Genomics). Results are shown in FIG. 3.

Example 3

Drosophila genomic DNA was nicked with Nt.BspQI or Nb.BbVCI nicking endonuclease. The nicked DNA was labeled with Taq polymerase by nick translation using Atto dUTP. The labeled DNA was treated with PReCR reagent (New England Biolabs) to repair the nicks. The resulting DNA was stained with YOYO-1 (Life Technologies) and processed on the Irys system (BioNano Genomics). Mapping to a reference genome, center of mass, and False Positive (FP) and False Negative (FN) calculations were carried out using nano Studio data analysis software (BioNano Genomics). Results are shown in FIG. 4.

Example 4

A genome map was constructed using two nicking enzymes, Nt.BbvCI and Nt.BspQI, whose nick motifs were labeled with red and green dyes, respectively, across 27 BACs making up an MTP of a 2.1-Mb region containing the prolamin multigene family in the Ae. tauschii genome. FIG. 5A shows the layout of the IrysChip (BioNano Genomics).

The YOYO-stained DNA was loaded into the port, unwound within the pillar structures, and linearized inside 45 nm nanochannels (FIG. 5B). After image processing, individual BAC molecules with red and green labels distributed at sequence-specific locations were compared and clustered into pools with similar map patterns (FIG. 5C, top). Density plots for the BAC clones were generated to determine the consensus peak locations (FIG. 5C, bottom). The consensus maps of individual BAC clones were aligned based on overlaps of consensus maps of adjacent BACs (FIG. 5D) to create a genome map of the entire region.

The two-color labeling strategy resulted in an average information density of one label per 4.8 kb (437 labels in 2.1 Mb). Since each motif was marked by its own color, peaks of different motifs could be distinguished from each other even if their peaks were almost overlapping (arrow in FIG. 5D). Peaks of the same motif (i.e., the same color) could be resolved when they were at least ˜1.5 kb apart. Taking advantage of the combination of long molecule lengths (˜140 kb average), high-resolution, accurate length measurement, and multiple sequence motifs, a high-quality genome map of the 2.1-Mb region for scaffold assembly was generated.

REFERENCES

1. Blakesley R, Hansen N, Gupta J, McDowell J, Maskeri B, et al. (2010) Effort required to finish shotgun-generated genome sequences differs significantly among vertebrates. BMC Genomics 11: 21.

2. Chain P S G, Grafham D V, Fulton R S, FitzGerald M G, Hostetler J, et al. (2009) Genome Project Standards in a New Era of Sequencing. Science 326: 236-237.

3. Lee H, Tang H (2012) Next-generation sequencing technologies and fragment assembly algorithms. Methods Mol Biol 855: 155-174.

4. Green ED (2001) Strategies for the systematic sequencing of complex genomes. Nat Rev Genet 2: 573-583.

5. McPherson TIHGMCJD (2001) A physical map of the human genome. Nature 409: 934-941.

6. Smith DB, Flavell RB (1975) Characterisation of the wheat genome by renaturation kinetics. Chromosoma 50: 223-242.

7. Venter J C, Adams M D, Myers E W, Li P W, Mural R J, et al. (2001) The Sequence of the Human Genome. Science 291: 1304-1351.

8. Zuccolo A, Sebastian A, Talag J, Yu Y, Kim H, et al. (2007) Transposable element distribution, abundance and role in genome size variation in the genus Oryza. BMC Evolutionary Biology 7: 152.

9. Initiative TAG (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796-815.

10. Project IRGS (2005) The map-based sequence of the rice genome. Nature 436: 793-800.

11. Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, et al. (2009) A single molecule scaffold for the maize genome. PLoS Genet 5: e1000711.

12. Schnable P S, Ware D, Fulton R S, Stein J C, Wei F, et al. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112-1115.

13. Luo M C, Thomas C, You F M, Hsiao J, Ouyang S, et al. (2003) High-throughput fingerprinting of bacterial artificial chromosomes using the snapshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics 82: 378-389.

14. Paux E, Sourdille P, Salse Jrm, Saintenac C, Choulet Fdr, et al. (2008) A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B. Science 322: 101-104.

15. Philippe R, Choulet F, Paux E, van Oeveren J, Tang J, et al. (2012) Whole Genome Profiling provides a robust framework for physical mapping and sequencing in the highly complex and repetitive wheat genome. BMC Genomics 13: 47.

16. van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, et al. (2011) Sequence-based physical mapping of complex genomes by whole genome profiling. Genome Research 21(4): 618-625.

17. Schwartz D C, Li X, Hernandez L I, Ramnarain S P, Huff E J, et al. (1993) Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262: 110-114.

18. Teague B, Waterman M S, Goldstein S, Potamousis K, Zhou S, et al. (2010) High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci U S A 107: 10848-10853.

19. Lam E T, Hastie A, Lin C, Ehrlich D, Das S K, et al. (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30: 771-776.

20. Mun J H, Kwon S J, Yang T J, Kim H S, Choi B S, et al. (2008) The first generation of a BAC-based physical map of Brassica rapa. BMC Genomics 9: 280.

21. Zhou S, Bechner M C, Place M, Churas C P, Pape L, et al. (2007) Validation of rice genome sequence by optical mapping. BMC Genomics 8: 278.

22. Nagarajan N, Read T D, Pop M (2008) Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24: 1229-1235.

23. Howden B P, Seemann T, Harrison P F, McEvoy C R, Stanton J A, et al. (2010) Complete genome sequence of Staphylococcus aureus strain JKD6008, an ST239 clone of methicillin-resistant Staphylococcus aureus with intermediate-level vancomycin resistance. J Bacteriol 192: 5848-5849.

24. Riley M C, Lee J E, Lesho E, Kirkup B C, Jr. (2011) Optically mapping multiple bacterial genomes simultaneously in a single run. PLoS One 6: e27085.

25. Lin H C, Goldstein S, Mendelowitz L, Zhou S, Wetzel J, et al. (2012) AGORA: Assembly Guided by Optical Restriction Alignment. BMC Bioinformatics 13: 189.

26. Xiao M, Phong A, Ha C, Chan T-F, Cai D, et al. (2007) Rapid DNA mapping by fluorescent single molecule detection. Nucleic Acids Research 35: e16.

27. Das S K, Austin M D, Akana M C, Deshpande P, Cao H, et al. (2010) Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Research 38: e177.

28. Dvorak J (2009) Triticeae Genome Structure and Evolution. Genetics and Genomics of the Triticeae Springer Science.

29. Li W, Zhang P, Fellers J P, Friebe B, Gill B S (2004) Sequence composition, organization, and evolution of the core Triticeae genome. Plant J 40: 500-511.

30. Cassidy B G, Dvorak J (1991) Molecular Characterization of a Low-Molecular-Weight Glutenin Cdna Clone from Triticum-Durum. Theoretical and Applied Genetics 81: 653-660.

31. Hernandez P, Martis M, Dorado G, Pfeifer M, Galvez S, et al. (2012) Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J 69: 377-386.

32. Leroy P, Guilhot N, Sakai H, Bernard A, Choulet F, et al. (2012) TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes. Front Plant Sci 3: 5.

33. Brenchley R, Spannagl M, Pfeifer M, Barker G L, D'Amore R, et al. (2012) Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491: 705-710.

34. Li Y, Zheng H, Luo R, Wu H, Zhu H, et al. (2011) Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat Biotechnol 29: 723-730.

35. Soderlund C, Longden I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13: 523-535.

36. Warren R L, Varabei D, Platt D, Huang X, Messina D, et al. (2006) Physical map-assisted whole-genome shotgun sequence assemblies. Genome Res 16: 768-775. 

1-36. (canceled)
 37. A method of characterizing a DNA, the method comprising: nicking a first DNA at a first sequence motif, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the nicks on the first DNA with a first label; marking the labeled first DNA with a third label, wherein the third label is non-sequence-specific, and wherein the third label is different from the first label; linearizing the first DNA following labeling with the first and third labels; and detecting the pattern of the first label on the linearized first DNA.
 38. The method of claim 37, further comprising repairing the nicks on the first DNA.
 39. The method of claim 37, further comprising: nicking a second DNA at the first sequence motif; labeling the nicks on the second DNA with the first label; repairing the nicks on the second DNA; marking the repaired second DNA with the third label; linearizing the second DNA following labeling with the first and third labels; and detecting the pattern of the first label on the linearized second DNA.
 40. The method of claim 37, further comprising: nicking the repaired first DNA at a second sequence motif, wherein the repaired first DNA remains double-stranded adjacent to the nicks; labeling the nicks at the second sequence motif on the first DNA with a second label, wherein the third label is different from the second label; and repairing the nicks on the first DNA following labeling with the second label.
 41. The method of claim 40, wherein the first label and second label comprise the same label.
 42. The method of claim 37, wherein the first DNA and the second DNA are each from the same source.
 43. The method of claim 37, wherein the linearizing comprises transporting the DNA into a nanochannel.
 44. The method of claim 37, further comprising comparing the pattern of the first labels to a pattern of labels on a reference DNA.
 45. The method of claim 37, wherein nicking the first DNA comprises nicking with Nt.BpsQI.
 46. The method of claim 37, wherein the first label is selected from the group consisting of a fluorophore, a quantum dot, a dendrimer, a nanowire, a bead, a hapten, a streptavidin, an avidin, a neutravidin, a biotin, and a reactive group.
 47. The method of claim 37, wherein the first label does not comprise a fluorophore, and wherein the first label does not comprise a quantum dot.
 48. A method of characterizing DNA, the method comprising: nicking one strand of a first DNA at a recognition sequence with a first nicking endonuclease, wherein the first DNA is double stranded, and wherein the first DNA remains double-stranded adjacent to the nicks; labeling the first DNA at the nicking sites with a first label; repairing the nicks on the first DNA; nicking a complementary strand of a second DNA at the recognition sequence with a second nicking endonuclease, wherein the complementary strand of the second DNA is complementary to the one strand of the first DNA, wherein the second DNA is double stranded, and wherein the second DNA remains double-stranded adjacent to the nicks; labeling the second DNA at the nicking sites with a second label; repairing the nicks on the second DNA; marking the repaired first and second DNA with a third label, wherein the third label is non-sequence specific; linearizing the marked first DNA and marked second DNA; and detecting a pattern of the first and second label on the linearized first DNA and linearized second DNA.
 49. The method of claim 48, further comprising: nicking one strand of a third DNA at a recognition sequence with the first nicking endonuclease, wherein the third DNA is double stranded, and wherein the third DNA remains double-stranded adjacent to the nicks; labeling the third DNA at the nicking sites; repairing the nicks on the third DNA; nicking a complementary strand of a fourth DNA at the recognition sequence with the second nicking endonuclease, wherein the complementary strand of the fourth DNA is complementary to the one strand of the third DNA; labeling the fourth DNA at the nicking sites; repairing the nicks on the fourth DNA; and marking the repaired third and fourth DNAs with a third label, wherein the third label comprises a non-sequence-specific label.
 50. The method of claim 48, wherein the first DNA and second DNA are each from the same source.
 51. The method of claim 48, wherein the first DNA and second DNA are each from a first source, and wherein the second and third DNA are each from a second source.
 52. The method of claim 48, wherein the first label and second label comprise the same label.
 53. The method of claim 48, wherein the first label is selected from the group consisting of a fluorophore, a quantum dot, a dendrimer, a nanowire, a bead, a hapten, a streptavidin, an avidin, a neutravidin, a biotin, and a reactive group.
 54. The method of claim 48, wherein the first label does not comprise a fluorophore, and wherein the first label does not comprise a quantum dot.
 55. The method of claim 48, wherein the linearizing includes transporting the DNA into a nanochannel.
 56. A method of characterizing a DNA comprising a double-stranded DNA comprising at least one base flap on either strand of the DNA, the method comprising: treating the double-stranded DNA with a 5′ to 3′ exonuclease activity of a polymerase under conditions wherein at least one species of dNTP is present in limited concentration or omitted compared to other dNTPs that are present; ligating the nicks to restore strand integrity at flap regions; and characterizing the DNA. 